Feature Selection in Frequent Subgraphs Feature Selektion auf häufigen Subgraphen
نویسنده
چکیده
Bioinformatics is producing a wealth of network data, ranging from molecular graphs to complex gene expression networks. To distinguish different classes of graphs, such as different functional classes of proteins, one common approach is to search for common frequent subgraphs. However, this method suffers from the fact that it quickly generates thousands or even millions of frequent subgraphs. For biology and data mining, the interesting question is now how to identify the most informative subgraphs out of this vast amount of features. The two goals of this master thesis are to advance a new subgraph mining method and to design novel approaches to this feature selection problem.
منابع مشابه
Near-optimal Supervised Feature Selection among Frequent Subgraphs
Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with c...
متن کاملCombining near-optimal feature selection with gSpan
Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with c...
متن کاملDiscriminative frequent subgraph mining with optimality guarantees
The goal of frequent subgraph mining is to detect subgraphs that frequently occur in a dataset of graphs. In classification settings, one is often interested in discovering discriminative frequent subgraphs, whose presence or absence is indicative of the class membership of a graph. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines tw...
متن کاملTowards an Efficient Discovery of the Topological Representative Subgraphs
With the emergence of graph databases, the task of frequent subgraph discovery has been extensively addressed. Although the proposed approaches in the literature have made this task feasible, the number of discovered frequent subgraphs is still very high to be efficiently used in any further exploration. Feature selection based on exact or approximate structural similarity is a way to reduce th...
متن کاملMining Interpretable Subgraphs
We present a measure that estimates the interpretability of a frequent subgraph. We show that a feature selection algorithm that uses this measure creates a set of features that is smaller and equally predictive as features obtained in earlier studies. A significant number of the selected features turn out to be trees or cyclic graphs, leading us to the conclusion that such features are not as ...
متن کامل